Robot recognizes three simultaneous speech by active audition

نویسندگان

  • Kazuhiro Nakadai
  • Hiroshi G. Okuno
  • Hiroaki Kitano
چکیده

AbmacIRobots should listen lo and mognire speeehes with their own ears under noisy environments and simultaneous speeches to attain smooth commuoicatioos with people in a real world. This paper presents three simultaneous speech recognition based on active audition which integrates audition with motion. Our mbot audition system eonsbts of three modules a real-time human tracking system, an active direction-pass filter (ADPF) and a speeeh recognition system using multiple acoustic models. The real-time human tracking system realizes mbust and ~ c e ~ r a t e sound soum loealization and tracking by audio-visual integration. The performance of loealimtion shows that the resolution of the center of the mbot is much higher than that of the peripheral. We eall this phenomena “auditory fovea” because it b similar lo visual fovea (high resalution in the center of human eye). Active motions such as being direeted at the sound soume impmveloeskation because of making the best use of the auditory fovea. The ADPF realizes accurate and fast sound separation by using a pair of micmphones. The ADPF separates sounds originating fmm the specified direction obtained by the d t i m e human tracking system. Because the performance of %parationdepends on thesccuracyaf loealization, the extraction of sound fmm the fmnt direetion is more accurate than that of sound from the periphery. This means that the pass range of the ADPF should be narmwer in the fmnt direction than in the periphery. In other words, such active p a s range contml improves sound separation. The separated speech b recognized by the speech recognition using multiple acoustic models that integrates multiple mulls to output the mult with the maximum likelihwd. Active motions such as being directed at a sound wuree impmve speech reeognition beeause it realizes not only impmvement of sound extraction but also easier integration of the results using face ID by face recognition. The robat audition system improved by active audition is implemented on an upper-torso humanoid. The system attains loealization, separation and reeognitian of t h m simulhneous speeches and the results pmves the efficiency of active audition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Compone...

متن کامل

Three simultaneous speech recognition by integration of active audition and face recognition for humanoid

This paper addresses listening to three simultaneous talkers by a humanoid with two microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable due to interfering voices. Huma...

متن کامل

Design and Implementation of Robot Audition System 'HARK' - Open Source Software for Listening to Three Simultaneous Speakers

This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition sy...

متن کامل

A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition

This study addresses a framework for a robot audition system, including sound source localization (SSL) and sound source separation (SSS), that can robustly recognize simultaneous speeches in a real environment. Because SSL estimates not only the location of speakers but also the number of speakers, such a robust framework is essential for simultaneous speech recognition. Moreover, improvement ...

متن کامل

Simultaneous Speech Recognition Based on Automatic Missing Feature Mask Generation by Integrating Sound Source Separation

Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech reco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003